Road-testing the English Resource Grammar Over the British National Corpus

نویسندگان

  • Timothy Baldwin
  • Emily M. Bender
  • Dan Flickinger
  • Ara Kim
  • Stephan Oepen
چکیده

This paper addresses two questions: (1) when a large deep processing resource developed for relatively closed domains is run over open text, what coverage does it have, and (2) what are the most effective and time-efficient ways of consolidating gaps in the coverage of

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Verb-Particle Constructions in English

We propose different syntax-based methods for automatically identifying verb-particle constructions in English. The methods are based on the Deterministic Finitestate Automaton (DFA), Hidden Markov Model(HMM), and Synchronous ContextFree Grammar (SCFG). Our experiments show that the methods could result in F-score 83.3% over our manually annotated test-set consisting of Wikipedia articles and B...

متن کامل

The Influence of Prosody and Ambiguity on English Relativization Strategies

We present evidence that, for English, ambiguity is an active factor in the choice of relativization strategy and that, in speech, prosody plays a role in resolution of ambiguity over the internal role of the relativized constituent. The evidence is based on (semi-)automatic analysis and comparison of automatically-parsed written and spoken portions of the British National Corpus (BNC, Leech, 1...

متن کامل

The Syntactically Annotated ICE Corpus and the Automatic Induction of a Formal Grammar

The International Corpus of English is a corpus of national and regional varieties of English. The mega-word British component has been constructed, grammatically tagged, and syntactically parsed. This article is a description of work that aims at the automatic induction of a wide-coverage grammar from this corpus as well as an empirical evaluation of the grammar. It first of all describes the ...

متن کامل

The American National Corpus: More Than the Web Can Provide

The American National Corpus (ANC) project is developing a corpus comparable to the British National Corpus (BNC), covering American English. Recent interest in the web as a source of corpus materials has caused some in the language processing community to suggest that the development of a corpus of American English is unnecessary. However, we argue that far from being rendered superfluous by t...

متن کامل

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004